library(qtlcharts)
library(CountClust)
library(parallel)
library(cellcycleR)
library(data.table)
library(binhf)
library(vioplot)
library(limma)
library(readxl)
In this script, we perform the cellcycleR method on the iPSC cells data collected by Po Yuan in Yoav’s lab. The data is still not publicly available, but you can check out John Blischak’s website for the information on the data collection, the preliminary analysis Yoav’a lab have been focussing on so far. For this data, unlike for Oscope and the Marioni data, the other two datasets we are looking at, we do not have any information on the cell phases. The cell phases were estimated using a very ad-hoc approach by Macosko et al, see their paper here.
We have batch corected and individual corrected the expression levels for the iPSC samples. Also, we have fitted admixture models to learn about the distinctive behavior across the different phases. Check our webpage for further details. We now load the batch corrected gene expression data.
setwd('/Users/kushal/Documents/singleCell-method/project/analysis')
molecules_single_cell_cycle <- read.table("../data/molecules_ipsc_single_cell_cycle.txt");
cycle_counts_data <- t(molecules_single_cell_cycle);
dim(cycle_counts_data)
## [1] 578 543
Next, we perform voom to log normalize the gene expression across the cells and then for each gene, we perform mean correction and standardization.
cycle_voom_data <- voom(cycle_counts_data)$E;
cycle_data_norm <- apply(cycle_voom_data,2,function(x) return (x-mean(x))/sd(x))
celltime_levels <- 100;
cycle_data_norm <- cycle_data_norm[, -which(colSums(cycle_data_norm)==0)]
dim(cycle_data_norm)
## [1] 578 541
The cell phases as assigned using Macosko method
cell_phases <- as.vector(as.matrix(read.table("../data/cell_phase_vector_yoav.txt")));
out <- cell_ordering_class(cycle_data_norm, celltime_levels = 100, num_iter=100, save_path="../rdas/cell_order_ipsc.rda")
We ran the method above once already (took around 5 minutes) and now we just load the output.
out <- get(load(file="../rdas/cell_order_ipsc.rda"));
cell_order_full <- cell_ordering_full(out$signal_intensity, dim(cycle_data_norm)[2])
We look at the plots of the amplitudes, phases and the non signal variation of the genes.
amp_genes <- out$amp;
sd_genes <- out$sigma;
phi_genes <- out$phi;
plot(density(phi_genes), col="red", main="Density plot of the phases")
plot(density(amp_genes[-which.max(amp_genes)]), col="red", main="Density plot of the amplitudes")
plot(density(sd_genes[-which.max(sd_genes)]), col="red", main="Density plot of the non-signal sd")
We extract the genes with high SNR - these are more likely to be sinusoidal.
ESS <- amp_genes^2; RSS <- sd_genes^2
SNR <- ESS/RSS;
plot(SNR, col="red", pch=20, lwd=1)
top_genes <- which(SNR > 3);
Next we plot the qtlcharts for these top sinusoidal genes and see if their patterns are indeed sinusoidal or not.
iplotCurves(t(cycle_data_norm[order(cell_order_full),top_genes]))
## Set screen size to height=700 x width=1000
We also provide the violin plot of the cell orders estimated against the cell phase labels vector obatiend using Macosko method. There are 96 cells from G1, S and G2. So, we perform qtlcharts keeping the relative order of the sample and see how the patterns look like (since numbers for three phases are same - 96, you can view it as uniformly split).
vioplot(cell_order_full[which(cell_phases=="G1.S")],
cell_order_full[which(cell_phases=="S")],
cell_order_full[which(cell_phases=="G2.M")],
cell_order_full[which(cell_phases=="M")],
cell_order_full[which(cell_phases=="M.G1")],
names=c("G1","S","G2M","M","M.G1"),
col="red")
iplotCurves(t(cycle_data_norm[c(which(cell_phases=="G1"),which(cell_phases=="S"), which(cell_phases=="G2.M"),which(cell_phases=="M"),which(cell_phases=="M.G1")),top_genes]))
We extract the high SNR genes as they seem to have sinusoidal patterns and repeat the procedure again. The aim here is to see how much robust the analysis is to the selction of the (most) sinusoidal genes.
snr_high_indices <- which(SNR > 1);
cycle_data_norm_sinusoidal <- cycle_data_norm[,snr_high_indices];
dim(cycle_data_norm_sinusoidal)
## [1] 578 180
We apply the cell ordering mechanism (it takes around 5 minutes to run)
out2 <- cell_ordering_class(cycle_data_norm_sinusoidal, celltime_levels = 100, num_iter=100,
save_path="../rdas/cell_order_ipsc_sinusoidal.rda")
We reload the output
out2 <- get(load(file="../rdas/cell_order_ipsc_sinusoidal.rda"));
cell_order_full <- cell_ordering_full(out2$signal_intensity, dim(cycle_data_norm_sinusoidal)[2])
We plot the same features as above and check for the robustness. We needed to shift the cell order so as to compare with previous plot on all genes as the method is non-identifiable upto a rotation.
amp_genes <- out2$amp;
sd_genes <- out2$sigma;
phi_genes <- out2$phi;
plot(density(phi_genes), col="red", main="Density plot of the phases")
plot(density(amp_genes[-which.max(amp_genes)]), col="red", main="Density plot of the amplitudes")
plot(density(sd_genes[-which.max(sd_genes)]), col="red", main="Density plot of the non-signal sd")
ESS <- amp_genes^2; RSS <- sd_genes^2
SNR <- ESS/RSS;
plot(SNR, col="red", pch=20, lwd=1)
top_genes <- which(SNR > 0.5);
new_cell_order <- shift(order(cell_order_full),40,dir="right")
iplotCurves(t(cycle_data_norm_sinusoidal[new_cell_order,top_genes]))
This looks very sinusoidal, there also do not seem to be outliers screwing up the analysis. Is this ordering conforming to Macosko method?
vioplot(cell_order_full[which(cell_phases=="G1.S")],
cell_order_full[which(cell_phases=="S")],
cell_order_full[which(cell_phases=="G2.M")],
cell_order_full[which(cell_phases=="M")],
cell_order_full[which(cell_phases=="M.G1")],
names=c("G1","S","G2M","M","M.G1"),
col="red")
It seems that our method seems to mainly distinguish the G2.M and M phases from the other phases, which is again plausible because in stem cells, we mainly have the S (synthesis) and the mitosis (M) phase.
Yoav’s data probably gives us the most visible sinusoidal patterns after cell ordering than the Macosko and OScope method (the latter I am sure will also give if we correct for the outliers), but it seems that all the sinusoidal genes kind of peak in G2.M and M phases, and we do not see any sinusoidal genes that peak in any of the other phases. Is this because the stem cells are likely to show synthesis (S) and the mitosis (M) phases only? The cell ordering is different from PCA, the patterns are very sinusoidal, so I am feeling more confident about the outcomes we see.